Skip to content

fix(geneformer): resolve protobuf conflict with nvidia-resiliency-ext>=0.6.0#1598

Merged
pstjohn merged 1 commit into
NVIDIA-BioNeMo:mainfrom
svc-bionemo:svc-bionemo/fix-nightly-20260603-7bb88738
Jun 5, 2026
Merged

fix(geneformer): resolve protobuf conflict with nvidia-resiliency-ext>=0.6.0#1598
pstjohn merged 1 commit into
NVIDIA-BioNeMo:mainfrom
svc-bionemo:svc-bionemo/fix-nightly-20260603-7bb88738

Conversation

@svc-bionemo

@svc-bionemo svc-bionemo commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Problem

The geneformer nightly CI fails because megatron-core==0.17.1 asserts nvidia-resiliency-ext>=0.6.0 at import time, but that version isn't installed in the CI container.

Simply adding nvidia-resiliency-ext>=0.6.0 to pyproject.toml creates an unresolvable pip conflict:

  • nvidia-resiliency-ext 0.6.0 → grpcio-tools>=1.76.0 → protobuf>=6.30.0
  • nemo-toolkit==2.4.0 → protobuf~=5.29.5

Fix

Add a .ci_build.sh that installs nvidia-resiliency-ext>=0.6.0 with --no-deps (skipping grpcio-tools which is not needed for test execution), then installs the package normally.

The CI workflow already supports .ci_build.sh as a hook — if the file exists, it runs that instead of the default pip install -e ..

Root Cause

nvidia-resiliency-ext 0.6.0 added grpcio/grpcio-tools as hard dependencies (for its new gRPC-based fault tolerance features), but these are incompatible with nemo-toolkit 2.4.0's protobuf pin. This will resolve when nemo-toolkit relaxes its protobuf constraint.

@copy-pr-bot

copy-pr-bot Bot commented Jun 3, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3b95d24d-7202-488b-a6c9-8ecc71a53137

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@pstjohn

pstjohn commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

/ok to test 3e37d22

@pstjohn pstjohn enabled auto-merge June 3, 2026 13:49
…>=0.6.0

megatron-core==0.17.1 requires nvidia-resiliency-ext>=0.6.0 at runtime,
but nvidia-resiliency-ext 0.6.0 pulls in grpcio-tools>=1.76.0 which
requires protobuf>=6.30.0 — conflicting with nemo-toolkit==2.4.0 pinning
protobuf~=5.29.5.

Fix by using a .ci_build.sh script that installs nvidia-resiliency-ext
with --no-deps (skipping grpcio-tools) and then installs the package
normally. grpcio-tools is not needed for geneformer test execution.

Signed-off-by: svc-bionemo <267129667+svc-bionemo@users.noreply.github.com>
auto-merge was automatically disabled June 3, 2026 15:25

Head branch was pushed to by a user without write access

@svc-bionemo svc-bionemo force-pushed the svc-bionemo/fix-nightly-20260603-7bb88738 branch from 3e37d22 to 9fe6df5 Compare June 3, 2026 15:25
@svc-bionemo svc-bionemo changed the title fix(geneformer): add nvidia-resiliency-ext>=0.6.0 dependency fix(geneformer): resolve protobuf conflict with nvidia-resiliency-ext>=0.6.0 Jun 3, 2026
@pstjohn pstjohn added this pull request to the merge queue Jun 5, 2026
Merged via the queue into NVIDIA-BioNeMo:main with commit 4e66543 Jun 5, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants